E04 - Affymetrix microarrays (Analysis of Gene Expression @ UCT Prague)
Copy, please, these files and directories to your personal directory:
cp -r ~/shared/AGE2021/Exercises/E04-microarrays ~/AGE2021/ExercisesAssignment
Task 1
Implement the select_var_features(m, n_top_features) in age_library.R. This functions will subset matrix ms rows by n_top_features with the highest variance across the columns (samples) and arrange the rows of subsetted matrix by decreasing variance. Also modify plot_pca() and plot_heatmap() so they will accept the n_top_features parameter, pass it to select_var_features() and do what should they do on the subsetted matrix m.
Then make a Rmd file named n_top_features.Rmd in which you will:
- Load the Rds file with
norm_dataobject located inE04-microarrays/norm_data.Rds, and sourceage_library.Rwith implementedselect_var_features()and other functions. - Make a PCA plot and heatmap using
n_top_features = 1000and colored (annotated) bysample_group.
Then render this Rmd file to n_top_features.html.
In this exercise, there are several cases of
n_top_featuresusage in PCA and heatmaps.
Task 2
The second assignment will be to implement a “signpost” (dea_signpost.Rmd) and template (dea_table_template.Rmd) Rmd files, and R script (report.R) for reporting of differential expression analysis results.
In the report.R script load the fit object located in E04-microarrays/fit.Rds and for each of the contrasts render the dea_table_template.Rmd file to <contrast_name>.html which will contain the contrast results table obtained from topTable(). Filter the results such that \(FDR < 0.1\) and \(|LFC| > 1\). Then render the dea_signpost.Rmd file to dea_signpost.html which will contain a table with links to the rendered HTML files with contrast results.
A simplified procedure:
report.R- Load the
fitobject. - Loop over the contrast results as
res:- Render
dea_table_template.Rmdwithrestable to<contrast_name>.html.
- Render
- Render
dea_signpost.Rmdtodea_signpost.htmlwhich will contain a table with contrast names and links to<contrast_name>.htmlfiles.
- Load the
The implementation is very similar to the loop used in ReportingTools section, but instead of using ReportingTools functions you will basically use rmarkdown::render(). Detailed instructions are given in Custom reporting of DEGs section.
Please, return the assignment in this form:
- In the header of all Rmd files, put your name and e-mail to the
authorparameter. - Copy the files to a directory named
E04-microarrays_assignment_<your_name>: all files needed to reproduce your results (R and Rmd files), and rendered HTMLs. Put this directory to a ZIP archive namedE04-microarrays_assignment_<your_name>.zip.- In the rendered HTMLs there should be included all code chunks with full output (warnings, messages, etc.).
- Upload the ZIP file to the appropriate assignment in MS Teams.
Introduction
From Wikipedia:
A DNA microarray (also commonly known as DNA chip or biochip) is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome. Each DNA spot contains picomoles (10−12 moles) of a specific DNA sequence, known as probes (or reporters or oligos). These can be a short section of a gene or other DNA element that are used to hybridize a cDNA or cRNA (also called anti-sense RNA) sample (called target) under high-stringency conditions. Probe-target hybridization is usually detected and quantified by detection of fluorophore-, silver-, or chemiluminescence-labeled targets to determine relative abundance of nucleic acid sequences in the target.
A typical protocol for microarray experiment. mRNA is isolated from the sample and reverse transcribed to cDNA which is labelled by fluorescent dye. The next steps are hybridization, in which cDNA hydridizes with probes on microarray, and the consequent washing, where non-specific sequences are removed, i.e. only strongly paired strands will remain hybridized. The final wet-lab step is scanning in which the fluorescence intensity of probe spots is measured. Total strength of the signal, from a spot (feature), depends upon the amount of target sample binding to the probes present on that spot. Microarrays use relative quantitation in which the intensity of a feature is compared to the intensity of the same feature under a different condition, and the identity of the feature is known by its position. Intensities are summarised and analysis of data usually starts with normalization and technical quality control, and is followed by exploratory analysis and testing for differential expression. Image adapted from Wikipedia (1, 2).
For a comphrehensive overview of Affymetrix microarrays data analysis check An end to end workflow for differential gene expression using Affymetrix microarrays.
Our experiment
Data come from experiment where rats were separately treated with 127 compounds dissolved in corn oil. To control rats only corn oil was given. The goal was to determine the carcinogenity of these compounds. We will be working only with a subset, where are some well-known drugs:
- Acetaminophen
- Aspirin
- Diclofenac
- Ibuprofen
For each drug, we have 3 rats (= 3 biological replicates). For controls we have 6 rats.
The used chip is Affymetrix Rat Genome 230 2.0 Array (Rat230_2).
Libraries
library(conflicted)
conflict_prefer("expand", "S4Vectors")
library(here)
library(emo)
library(tidyverse)
library(glue)
library(oligo)
library(limma)
conflict_prefer("plotMA", "limma")
library(ReportingTools)
library(lattice)
library(sva)
source(here("age_library.R"))Config
Input files:
BASE_DIR <- here("E04-microarrays")
DATA_DIR <- here(BASE_DIR, "data")
EXPERIMENT_1_DATA_DIR <- here(DATA_DIR, "E-GEOD-57822")
EXPERIMENT_1_SAMPLE_SHEET_FILE <- here(EXPERIMENT_1_DATA_DIR, "E-GEOD-57822.sdrf.txt")
# This experiment will be used to demonstrate the batch effect.
EXPERIMENT_2_DATA_DIR <- here(DATA_DIR, "E-GEOD-68065")
EXPERIMENT_2_SAMPLE_SHEET_FILE <- here(EXPERIMENT_2_DATA_DIR, "E-GEOD-68065.sdrf.txt")Parameters for differential expression analysis:
REPORT_DIR <- here(BASE_DIR, "html_reports")
REPORT_N_GENES <- 50
REPORT_LFC_THRESHOLD <- 1
REPORT_P_VALUE_THRESHOLD <- 0.1
REPORT_P_VALUE_ADJUST_METHOD <- "fdr"Reading the data
Phenotypical data
We need only few columns from the sample sheet. You can explore the /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-57822/E-GEOD-57822.sdrf.xlsx Excel sheet.
pheno_data <- readr::read_delim(EXPERIMENT_1_SAMPLE_SHEET_FILE, delim = "\t", progress = FALSE) %>%
dplyr::select(1, 48, 41) %>%
magrittr::set_colnames(c("sample_name", "sample_group", "cel_file"))
pheno_dataLet’s adjust the pheno_data a little bit more:
pheno_data <- pheno_data %>%
dplyr::mutate(
sample_name = str_remove(sample_name, " 1"),
sample_group = dplyr::if_else(sample_group == "not specified", "control", sample_group) %>%
str_to_lower() %>%
factor() %>%
relevel("control")
) %>%
as.data.frame() %>%
magrittr::set_rownames(.$sample_name)
pheno_dataCEL files
Microarray manufacturers use their own format to store intensity data, usually in binary form. Affymetrix is using CEL format.
oligo is a package for analyzing microarrays and supports Affymetrix (user guide). Raw data are stored in object of class ExpressionFeatureSet.
cel_files <- glue("{EXPERIMENT_1_DATA_DIR}/{pheno_data$cel_file}")
raw_data <- read.celfiles(cel_files)## Reading in : /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-57822/GSM1392637_94676.CEL
## Reading in : /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-57822/GSM1392635_94671.CEL
## Reading in : /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-57822/GSM1392459_94436.CEL
## Reading in : /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-57822/GSM1394076_96744.CEL
## Reading in : /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-57822/GSM1394035_96701.CEL
## Reading in : /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-57822/GSM1394026_96692.CEL
## Reading in : /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-57822/GSM1392691_94740.CEL
## Reading in : /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-57822/GSM1392598_94590.CEL
## Reading in : /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-57822/GSM1392512_94500.CEL
## Reading in : /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-57822/GSM1393683_96263.CEL
## Reading in : /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-57822/GSM1393649_96202.CEL
## Reading in : /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-57822/GSM1393630_96182.CEL
## Reading in : /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-57822/GSM1394061_96729.CEL
## Reading in : /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-57822/GSM1394040_96706.CEL
## Reading in : /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-57822/GSM1393116_95535.CEL
## Reading in : /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-57822/GSM1393085_95478.CEL
## Reading in : /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-57822/GSM1392575_94565.CEL
## Reading in : /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-57822/GSM1393124_95554.CEL
raw_data is object of class ExpressionFeatureSet, which is designated to store the raw (non-summarised, non-normalized) data from an experiment. It is derived from ExpressionSet class, of which object we will obtain after data normalization. ExpressionSet is one of the core Bioconductor’s classes found in Biobase package. It is used as a container for high-throughput assays and experimental metadata. You can read its manual here.
The basic structure of ExpressionSet object. There are three matrices: intensity (expression) matrix (rows = probes, columns = samples), phenotypical (sample) data, and feature (probe) data, which adds additional information about probes (probe ID, gene ID, symbol, name, etc.). The expression matrix is accesible via exprs(). The sample data are accesible via pData() (or phenoData(), see below). The feature data are accesible via fData() (or featureData(), see below). It is important that the latter two matrices are linked to the expression matrix via their rownames: rownames of the phenotypical matrix match colnames of the expression matrix, and rownames of the feature matrix match rownames of the expression matrix. Anytime you subset any of the matrices, the rest will be automatically subsetted (“synchronization”).
There is also other data stored in ExpressionSet, such as experimentData, but that is not important for now. You can find it in the manual above or see ?ExpressionSet. Alternatively, you can look at methods implemented by a class, in this case the ExpressionSet:
methods(class = "ExpressionSet")## [1] [ [[ [[<- $ $<- abstract annotation annotation<- as.data.frame as.matrix
## [11] assayData assayData<- boxplot classVersion classVersion<- coerce combine description description<- dim
## [21] dimnames dimnames<- dims esApply experimentData experimentData<- exprs exprs<- fData fData<-
## [31] featureData featureData<- featureNames featureNames<- fvarLabels fvarLabels<- fvarMetadata fvarMetadata<- genefinder getNetAffx
## [41] ggplot hist initialize isCurrent isVersioned makeDataPackage MAplot notes notes<- nsFilter
## [51] pData pData<- phenoData phenoData<- preproc preproc<- protocolData protocolData<- pubMedIds pubMedIds<-
## [61] rowFtests rowMedians rowpAUCs rowQ rowttests sampleNames sampleNames<- show storageMode storageMode<-
## [71] updateObject updateObjectTo varLabels varLabels<- varMetadata varMetadata<- write.exprs
## see '?methods' for accessing help and source code
You can see all the stored data in the raw_data summary:
raw_data## ExpressionFeatureSet (storageMode: lockedEnvironment)
## assayData: 695556 features, 18 samples
## element names: exprs
## protocolData
## rowNames: GSM1392637_94676.CEL GSM1392635_94671.CEL ... GSM1393124_95554.CEL (18 total)
## varLabels: exprs dates
## varMetadata: labelDescription channel
## phenoData
## rowNames: GSM1392637_94676.CEL GSM1392635_94671.CEL ... GSM1393124_95554.CEL (18 total)
## varLabels: index
## varMetadata: labelDescription channel
## featureData: none
## experimentData: use 'experimentData(object)'
## Annotation: pd.rat230.2
Rownames and colnames of ExpressionSet can be simply retrieved by the usual functions, but for better readability is better to use the alternative Biobase functions:
# or colnames(raw_data)
sampleNames(raw_data)## [1] "GSM1392637_94676.CEL" "GSM1392635_94671.CEL" "GSM1392459_94436.CEL" "GSM1394076_96744.CEL" "GSM1394035_96701.CEL" "GSM1394026_96692.CEL" "GSM1392691_94740.CEL"
## [8] "GSM1392598_94590.CEL" "GSM1392512_94500.CEL" "GSM1393683_96263.CEL" "GSM1393649_96202.CEL" "GSM1393630_96182.CEL" "GSM1394061_96729.CEL" "GSM1394040_96706.CEL"
## [15] "GSM1393116_95535.CEL" "GSM1393085_95478.CEL" "GSM1392575_94565.CEL" "GSM1393124_95554.CEL"
# or rownames(raw_data)
featureNames(raw_data) %>% head()## [1] "1" "2" "3" "4" "5" "6"
Rownames are now just row numbers, because we already haven’t done the probe summarisation.
Sample and feature data are actually not a basic data.frame, but a more complex AnnotatedDataFrame. That is a “special data frame” which consists of two parts: dataframe describing metadata of samples and dataframe describing this metadata (i.e. its columns).
Let’s create such an AnnotatedDataFrame. First we change the sample names to a more neat form:
sampleNames(raw_data) <- pheno_data$sample_nameNow we create metadata for AnnotatedDataFrame:
metadata <- data.frame(
labelName = colnames(pheno_data),
labelDescription = c("", "Used drug.", "")
)
metadataAnd finally the AnnotatedDataFrame:
(pheno_data <- AnnotatedDataFrame(data = pheno_data, varMetadata = metadata))## An object of class 'AnnotatedDataFrame'
## rowNames: GSM1392637 GSM1392635 ... GSM1393124 (18 total)
## varLabels: sample_name sample_group cel_file
## varMetadata: labelName labelDescription
We also combine pheno_data with existing pheno data, as there is some existing information:
phenoData(raw_data)## An object of class 'AnnotatedDataFrame'
## rowNames: GSM1392637 GSM1392635 ... GSM1393124 (18 total)
## varLabels: index
## varMetadata: labelDescription channel
pData(raw_data) %>% head()phenoData(raw_data) <- Biobase::combine(phenoData(raw_data), pheno_data)
phenoData(raw_data)## An object of class 'AnnotatedDataFrame'
## rowNames: GSM1392637 GSM1392635 ... GSM1393124 (18 total)
## varLabels: index sample_name sample_group cel_file
## varMetadata: labelDescription channel labelName
pData(raw_data) %>% head()Look at the difference between phenoData() and pData(): the former returns AnnotatedDataFrame, while the latter data.frame.
Accessing the metadata:
varMetadata(raw_data)Accessing the expression matrix:
exprs(raw_data)[1:5, 1:5]## GSM1392637 GSM1392635 GSM1392459 GSM1394076 GSM1394035
## 1 115 103 74 224 144
## 2 9607 11337 7979 9620 7917
## 3 101 158 122 252 189
## 4 12162 12995 9640 9952 8179
## 5 77 90 78 89 58
Subsetting works as for normal matrix, but will subset all matrices:
raw_data_subset <- raw_data[1:5, 1:5]
raw_data_subset## ExpressionFeatureSet (storageMode: lockedEnvironment)
## assayData: 5 features, 5 samples
## element names: exprs
## protocolData
## rowNames: GSM1392637 GSM1392635 ... GSM1394035 (5 total)
## varLabels: exprs dates
## varMetadata: labelDescription channel
## phenoData
## rowNames: GSM1392637 GSM1392635 ... GSM1394035 (5 total)
## varLabels: index sample_name sample_group cel_file
## varMetadata: labelDescription channel labelName
## featureData: none
## experimentData: use 'experimentData(object)'
## Annotation: pd.rat230.2
exprs(raw_data_subset)## GSM1392637 GSM1392635 GSM1392459 GSM1394076 GSM1394035
## 1 115 103 74 224 144
## 2 9607 11337 7979 9620 7917
## 3 101 158 122 252 189
## 4 12162 12995 9640 9952 8179
## 5 77 90 78 89 58
pData(raw_data_subset)fData(raw_data_subset)Technical quality control
Now we will assess whether the experiment was performed correctly at the technical level.
Assumption: most of the genes do not change expression across the samples.
Pseudo-image
Pseudo-image plots are used to assess the spatial distribution of the data on the chips. Due to the magnitude of the readings, pseudo-images using data on the original scale often mask spatial features that may be present on the arrays. This is why we recommend the use of the \(log_2\) scale.
image(raw_data, which = 12, transfo = log2)You should see something like a random noise. In the case of a larger area of the same color, there could be some technical problem.
MA plot
\(j, k\) … arrays
\(i\) … probe
Difference in log-intensities of probe \(i\): \(M_i=log_2(Y_{i,j}/Y_{i,k})\)
Average log-intensity of probe \(i\): \(A_i=\frac{1}{2}log_2(Y_{i,j}*Y_{i,k})\)
Ideally, \(M = 0\) should apply for all probes in all samples, due to the assumption that most of the genes do not change expression across the samples.
Here, we plot MA for each combination of three samples:
MAplot(raw_data[, 12:14], pairs = TRUE)We can also summarise sample groups. By default, median is used to summarise probe intensities across the samples.
MAplot(raw_data, groups = pData(raw_data)$sample_group, pairs = TRUE)Boxplots
Ideally, we should see similar boxplots.
boxplot(raw_data, nsample = nrow(raw_data))Probe level model
We will fit PLM using a robust regression which allows to calculate the standard errors of the coefficients.
\(log_{2}(Y_{i,j})=\beta_j + \alpha_i + \epsilon_{i,j}\)
\(Y_{i,j}\) … background corrected intensity of probe \(i\) in sample \(j\)
\(\beta_j\) … true transcript intensity of sample \(j\)
\(\alpha_i\) … systematic bias of probe \(i\)
\(\epsilon_{i,j}\) … random noise of probe \(i\) in sample \(j\)
fit_plm <- fitProbeLevelModel(raw_data)Let’s see its properties:
image(fit_plm, which = 12)image(fit_plm, which = 12, type = "sign.residuals")Again, as in the case of pseudo-image, you should see something like a random noise. In the case of a larger area of the same color, there could be some technical problem.
Relative log expression (RLE)
\(y_{i,j}\) … intensity of probe \(i\) in sample \(j\)
For each probe \(i\) we calculate its median intensity across the samples: \(median(y_{i,*})\)
Then we calculate deviations from this median: \(y_{i,j} - median(y_{i,*})\)
Ideally, RLE values should be spread around \(0\).
RLE(fit_plm)Normalized unscaled standard errors (NUSE)
We can look at the standard error estimates from PLM, for each probe on each array \(j\). They are standardized so that the median standard error across arrays is \(1\) for each gene.
\(NUSE(\beta_j)=\frac{SE(\beta_j)}{median_j(\beta_j)}\)
Ideally, boxplots should be centered around \(1\). Samples having median above \(1.1\) and/or higher spread can be considered as low quality ones (source).
NUSE(fit_plm)Normalization and probe annotation
As we are working with Affymetrix microarrays, we use the RMA method for normalization:
norm_data <- rma(raw_data)## Background correcting
## Normalizing
## Calculating Expression
Now we only know the probe IDs, but not the gene IDs they correspond to:
featureNames(norm_data) %>% head()## [1] "1367452_at" "1367453_at" "1367454_at" "1367455_at" "1367456_at" "1367457_at"
And so we use a DB package rat2302.db corresponding to our chip to annotate the probes.
All of those annotation packages use SQLite database as a backend and have a common interface called AnnotationDbi.
Some methods in AnnotationDbi have names similar to those from other packages (like dplyr::select()). So always rather use the AnnotationDbi:: prefix.
Note: because SQLite is a very simple database, it allows only one session (i.e. database file can be opened by only one user at the time). So, by normal circumstances, it is not possible to use parallel processing when working with DB packages.
library(rat2302.db)
rat2302.db## ChipDb object:
## | DBSCHEMAVERSION: 2.1
## | Db type: ChipDb
## | Supporting package: AnnotationDbi
## | DBSCHEMA: RATCHIP_DB
## | ORGANISM: Rattus norvegicus
## | SPECIES: Rat
## | MANUFACTURER: Affymetrix
## | CHIPNAME: Rat Genome 230 2.0 Array
## | MANUFACTURERURL: http://www.affymetrix.com/support/technical/byproduct.affx?product=rat230-20
## | EGSOURCEDATE: 2015-Sep27
## | EGSOURCENAME: Entrez Gene
## | EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
## | CENTRALID: ENTREZID
## | TAXID: 10116
## | GOSOURCENAME: Gene Ontology
## | GOSOURCEURL: ftp://ftp.geneontology.org/pub/go/godatabase/archive/latest-lite/
## | GOSOURCEDATE: 20150919
## | GOEGSOURCEDATE: 2015-Sep27
## | GOEGSOURCENAME: Entrez Gene
## | GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
## | KEGGSOURCENAME: KEGG GENOME
## | KEGGSOURCEURL: ftp://ftp.genome.jp/pub/kegg/genomes
## | KEGGSOURCEDATE: 2011-Mar15
## | GPSOURCENAME: UCSC Genome Bioinformatics (Rattus norvegicus)
## | GPSOURCEURL: ftp://hgdownload.cse.ucsc.edu/goldenPath/rn6
## | GPSOURCEDATE: 2014-Aug1
## | ENSOURCEDATE: 2015-Jul16
## | ENSOURCENAME: Ensembl
## | ENSOURCEURL: ftp://ftp.ensembl.org/pub/current_fasta
## | UPSOURCENAME: Uniprot
## | UPSOURCEURL: http://www.uniprot.org/
## | UPSOURCEDATE: Thu Oct 1 22:36:23 2015
class(rat2302.db)## [1] "ChipDb"
## attr(,"package")
## [1] "AnnotationDbi"
Let’s show how to work with AnnotationDbi objects.
To show DB columns:
AnnotationDbi::columns(rat2302.db)## [1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS" "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL" "GENENAME" "GO"
## [12] "GOALL" "IPI" "ONTOLOGY" "ONTOLOGYALL" "PATH" "PFAM" "PMID" "PROBEID" "PROSITE" "REFSEQ" "SYMBOL"
## [23] "UNIGENE" "UNIPROT"
We can show a description of each column:
help("UNIGENE")The primary keys of this DB are probe IDs:
AnnotationDbi::keys(rat2302.db) %>% head()## [1] "1367452_at" "1367453_at" "1367454_at" "1367455_at" "1367456_at" "1367457_at"
But there are more key columns to use:
AnnotationDbi::keytypes(rat2302.db)## [1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS" "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL" "GENENAME" "GO"
## [12] "GOALL" "IPI" "ONTOLOGY" "ONTOLOGYALL" "PATH" "PFAM" "PMID" "PROBEID" "PROSITE" "REFSEQ" "SYMBOL"
## [23] "UNIGENE" "UNIPROT"
And finally, we can select data using our list of keys. We also specify which columns we want to return and what is our keytype:
feature_data <- AnnotationDbi::select(
rat2302.db,
# Which columns we want to retrieve from DB.
columns = c("PROBEID", "ENSEMBL", "SYMBOL", "GENENAME", "ENTREZID"),
# Vector of keys to use. Each key has a matching row in columns.
keys = featureNames(norm_data),
# Our key type is ID of probe.
keytype = "PROBEID"
)
head(feature_data)Some probes are targeting multiple genes:
janitor::get_dupes(feature_data, PROBEID) %>% head()How is it possible? Well, I am not sure 🙂 But one explanation could be that in the time microarray probes were designed, they were really targeting a single gene, but some time later, a more precise genome annotation was released.
We have two options: to take the first gene, or to summarise gene information to single row. In my opinion, the second option is better, as you won’t lose any information. On the other hand, you could have some problems, as, for example, ENSEMBL ID will look like this: ENSRNOG00000013741, ENSRNOG00000042523, and that cannot be used, for example, in joining operations.
For the sake of completeness, we will try both options, but go further in the analysis with the first one.
Let’s start with the second option, as we won’t use it later. We group feature_data by PROBEID and apply function to each of its other four columns. This function joins values by ,.
feature_data_sum <- feature_data %>%
dplyr::group_by(PROBEID) %>%
dplyr::summarise(dplyr::across(c(ENSEMBL, SYMBOL, GENENAME, ENTREZID), ~ str_c(., collapse = ", ")))
head(feature_data_sum)And now the first option. We simply drop duplicates with the dplyr::distinct() function. We specify PROBEID column to be used for determining the uniqueness.
feature_data <- dplyr::distinct(feature_data, PROBEID, .keep_all = TRUE)Now we can filter norm_data and assign feature data to it:
norm_data <- norm_data[feature_data$PROBEID, ]
if (any(feature_data$PROBEID != featureNames(norm_data)))
stop("Feature data mismatch.")
fData(norm_data) <- feature_data
annotation(norm_data) <- "rat2302.db"Feature data are also stored in ExpressionSet as AnnotatedDataFrame:
featureData(norm_data)## An object of class 'AnnotatedDataFrame'
## featureNames: 1 2 ... 31099 (31099 total)
## varLabels: PROBEID ENSEMBL ... ENTREZID (5 total)
## varMetadata: labelDescription
But it is not needed to directly assign AnnotatedDataFrame - in the step above, we have assigned just the dataframe with fData(norm_data) <- feature_data
Of note, some genes are targeted by multiple probes (if we take ENSEMBL ID as unique identifier):
janitor::get_dupes(fData(norm_data), ENSEMBL) %>% head(10)Depending on the situation, it is sometimes better to summarise such probes prior to downstream analyses.
However, rat genes are not so well annotated and many ENSEMBL IDs are NA:
table(is.na(fData(norm_data)$ENSEMBL))##
## FALSE TRUE
## 20777 10322
Prior to probe averaging, we have to drop probes with unknown ENSEMBL IDs. Depending on the situation, this could be unwanted, as, in our case, we will lose about one third of probes.
norm_data_avg <- norm_data
fdata_avg <- dplyr::filter(fData(norm_data), !is.na(ENSEMBL))In qPCR we summarised technical replicates by the function limma::avearrays() - a similar function operating on matrix rows is limma::avereps():
e_avg <- limma::avereps(exprs(norm_data)[fdata_avg$PROBEID, ], fdata_avg$ENSEMBL)
fdata_avg <- dplyr::distinct(fdata_avg, ENSEMBL, .keep_all = TRUE) %>%
magrittr::set_rownames(.$ENSEMBL)
norm_data_avg <- ExpressionSet(e_avg, phenoData = phenoData(norm_data), featureData = AnnotatedDataFrame(fdata_avg))
norm_data_avg## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 14021 features, 18 samples
## element names: exprs
## protocolData: none
## phenoData
## sampleNames: GSM1392637 GSM1392635 ... GSM1393124 (18 total)
## varLabels: index sample_name sample_group cel_file
## varMetadata: labelDescription channel labelName
## featureData
## featureNames: ENSRNOG00000003670 ENSRNOG00000033426 ... ENSRNOG00000017879 (14021 total)
## fvarLabels: PROBEID ENSEMBL ... ENTREZID (5 total)
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:
Just for control, calculate means of one of the duplicated probes and compare it with the matching row in norm_data_avg:
(x <- exprs(norm_data[c("1368344_at", "1370760_a_at"), ]))## GSM1392637 GSM1392635 GSM1392459 GSM1394076 GSM1394035 GSM1394026 GSM1392691 GSM1392598 GSM1392512 GSM1393683 GSM1393649 GSM1393630 GSM1394061 GSM1394040 GSM1393116
## 1368344_at 2.953731 3.203772 3.065683 3.060378 3.180770 3.198330 3.296952 3.014342 3.161879 3.301956 2.989474 3.168463 3.066366 3.061917 3.218393
## 1370760_a_at 3.779222 3.925204 4.086260 3.768364 3.827416 3.651712 3.628400 3.727738 3.874826 3.682873 4.001225 3.874488 3.802249 3.849154 4.101256
## GSM1393085 GSM1392575 GSM1393124
## 1368344_at 3.065507 3.163656 3.419277
## 1370760_a_at 3.871475 3.577619 3.943491
(x <- colMeans(x))## GSM1392637 GSM1392635 GSM1392459 GSM1394076 GSM1394035 GSM1394026 GSM1392691 GSM1392598 GSM1392512 GSM1393683 GSM1393649 GSM1393630 GSM1394061 GSM1394040 GSM1393116 GSM1393085
## 3.366477 3.564488 3.575971 3.414371 3.504093 3.425021 3.462676 3.371040 3.518353 3.492415 3.495350 3.521475 3.434308 3.455536 3.659824 3.468491
## GSM1392575 GSM1393124
## 3.370637 3.681384
(y <- exprs(norm_data_avg["ENSRNOG00000000007", ]))## GSM1392637 GSM1392635 GSM1392459 GSM1394076 GSM1394035 GSM1394026 GSM1392691 GSM1392598 GSM1392512 GSM1393683 GSM1393649 GSM1393630 GSM1394061 GSM1394040
## ENSRNOG00000000007 3.366477 3.564488 3.575971 3.414371 3.504093 3.425021 3.462676 3.37104 3.518353 3.492415 3.49535 3.521475 3.434308 3.455536
## GSM1393116 GSM1393085 GSM1392575 GSM1393124
## ENSRNOG00000000007 3.659824 3.468491 3.370637 3.681384
stopifnot(all(x == y))Biological quality control
Now we are ready to look at some biology. It is time to reuse your functions 🙂
groups <- pData(norm_data)$sample_group
names(groups) <- sampleNames(norm_data)
plot_hc(exprs(norm_data), color_by = groups, color_by_lab = "Sample Group")plot_pca(exprs(norm_data), sample_data = pData(norm_data), n_top_features = 1000, color_by = "sample_group", plot_type = "multi")$plotHowever, expression matrices tend to be quite noisy, e.g. a lot of features are not expressed at all or they are constant for all samples. So better is to compute the PCA on a subset of top N most variable features (probes in our case):
plot_pca(exprs(norm_data), sample_data = pData(norm_data), n_top_features = 1000, color_by = "sample_group", plot_type = "multi")$plotFor a heatmap, 1000 probes are enough. Don’t use a big number of probes, or your browser will freeze 😉
plot_heatmap(
exprs(norm_data)[1:1000, ],
z_score = TRUE,
column_annotation = dplyr::select(pData(norm_data), sample_group),
title = "Affymetrix",
legend_title = "z-score",
show_row_names = FALSE
)But again, a better way is to subset our expression matrix to contain only top N most variable probes:
plot_heatmap(
exprs(norm_data),
n_top_features = 1000,
z_score = TRUE,
column_annotation = dplyr::select(pData(norm_data), sample_group),
title = "Affymetrix",
legend_title = "z-score",
show_row_names = FALSE
)Differential expression analysis (DEA)
DEA tells us which genes are differentially expressed between conditions. As in the qPCR, this is, again, a relative measure: for example, we can say that a gene is third times more expressed in treatment group, relative to control group.
In the qPCR, we were using the simple t-test to test for the differential expression. We could use that, because qPCR data are small enough to do so. On the other hand, data from the high-throughput methods, such as microarrays and RNA-seq, are much bigger and one of their properties is that the number of samples is always much smaller than the number of features. For that reason, we have to do DEA using a more complex statistical method: linear models.
limma
For DEA we will be using limma. It is the most known package (in top 20 packages in Bioconductor) for DEA of microarray data and was probably the first which has used linear models for that.
We will fit a linear model for each probe and improve estimates of standard errors using the empirical Bayes approach. Because we have five sample groups (the first one being control), our model looks like this:
\(Y_i = \beta_0 + \beta_1 x_{i,1} + \dotso + \beta_4 x_{i,4} + \epsilon_i, i = 1 \dotso N\)
where \(x\) are indicator variables and \(i\) is a sample index. This can be rewritten to a matrix form as:
\[ \begin{pmatrix} Y_1 \\ Y_2 \\ \vdots \\ Y_N\end{pmatrix} = \begin{pmatrix} 1 & x_{1,1} & \dotso & x_{1,4} \\ \vdots & \vdots & \ddots & \vdots \\ 1 & x_{N,1} & \dotso & x_{N,4} \end{pmatrix} \begin{pmatrix} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_4 \end{pmatrix} + \begin{pmatrix} \epsilon_1 \\ \epsilon_2 \\ \vdots \\ \epsilon_N \end{pmatrix} \]
It can be simplified to:
\(\mathbf{Y} = \mathbf{X \beta} + \mathbf{\epsilon}\)
The matrix \(\mathbf{X}\) is the design matrix. Let’s make it:
group <- pData(norm_data)$sample_group %>% factor() %>% relevel("control")
dea_model <- model.matrix(~ group)
colnames(dea_model)[1] <- "Intercept"
dea_model## Intercept groupacetaminophen groupaspirin groupdiclofenac groupibuprofen
## 1 1 1 0 0 0
## 2 1 1 0 0 0
## 3 1 1 0 0 0
## 4 1 0 1 0 0
## 5 1 0 1 0 0
## 6 1 0 1 0 0
## 7 1 0 0 1 0
## 8 1 0 0 1 0
## 9 1 0 0 1 0
## 10 1 0 0 0 1
## 11 1 0 0 0 1
## 12 1 0 0 0 1
## 13 1 0 0 0 0
## 14 1 0 0 0 0
## 15 1 0 0 0 0
## 16 1 0 0 0 0
## 17 1 0 0 0 0
## 18 1 0 0 0 0
## attr(,"assign")
## [1] 0 1 1 1 1
## attr(,"contrasts")
## attr(,"contrasts")$group
## [1] "contr.treatment"
So for example, how is the first sample of diclofenac group expressed?
\(Y_{diclofenac,1} = \beta_0*1 + \beta_1*0 + \beta_2*0 + \beta_3*1 + \beta_4*0 + \epsilon = \beta_0 + \beta_3 + \epsilon\)
Intercept is our \(\beta_0\) (i.e. control = the reference group). In a factor variable, the first level is automatically taken as the reference level.
Some models could be quite complicated. You can read more about model design here (Chapter 5).
Now we can fit the model and use the empirical Bayes shrinkage, which will basically improve parameter estimates in our model by “borrowing” information from other probes:
fit <- lmFit(norm_data, dea_model) %>% eBayes()
colnames(fit)## [1] "Intercept" "groupacetaminophen" "groupaspirin" "groupdiclofenac" "groupibuprofen"
Using the topTable() function, we can look at differential expression statistics for each gene in a specific group (in this case, acetaminophen vs control). We call these comparisons contrasts and genes meeting some statistical criteria (FDR = significance, LFC = effect size) as differentially expressed genes (DEGs).
topTable(fit, coef = "groupacetaminophen")Let’s inspect the topTable columns:
logFC: \(log_2(\text{fold-change})\) relative to a reference group (in this casecontrol).AveExpr: average expression in a group (in this caseacetaminophen).t: t-statistic.P.Value: p-value of the differential expression test.adj.P.Val: p-value adjusted for multiple testing issue.B: log-odds that the gene is differentially expressed.
We didn’t specify any parameter to topTable, but if you look at its help page, you can see that it is possible to set p-value adjust method and also thresholds for adjusted p-value, as well as for minimal absolute value of LFC. By default, it won’t filter anything and return DEA statistics for first 10 genes by their B value.
DEA between other sample groups
But what if we want to compare two non-control groups? Well, we need to specify our own contrasts (in this case, aspirin vs ibuprofen and acetaminophen vs diclofenac):
contrasts <- makeContrasts(
aspirin_vs_ibuprofen = groupaspirin - groupibuprofen,
acetaminophen_vs_diclofenac = groupacetaminophen - groupdiclofenac,
levels = dea_model
)
contrasts## Contrasts
## Levels aspirin_vs_ibuprofen acetaminophen_vs_diclofenac
## Intercept 0 0
## groupacetaminophen 0 1
## groupaspirin 1 0
## groupdiclofenac 0 -1
## groupibuprofen -1 0
Behind it is just \((aspirin - control) - (ibuprofen - control) = aspirin - ibuprofen\), so comparison to control (reference) group simply cancels out. Now we can refit our model using the defined contrasts:
fit_c <- contrasts.fit(fit, contrasts = contrasts) %>% eBayes()
colnames(fit_c)## [1] "aspirin_vs_ibuprofen" "acetaminophen_vs_diclofenac"
topTable(fit_c, coef = "aspirin_vs_ibuprofen")MA plot
The MA plot shows the mean of the expression intesity versus the LFCs for all genes tested. The genes that are significantly DE are colored to be easily identified. You can see that besides the technical quality control, MA plot is also usable to check for to evaluate the magnitude of fold changes and how they are distributed relative to mean expression. Generally, we would expect to see significantly DE genes across the full range of expression levels.
We use the ggmaplot() function from ggpubr package, which expects a dataframe with specifically named columns as the input (see ?ggmaplot() for more details). We also specify that genes with adjusted p-value (FDR) below 0.1 and fold-change of 2 (LFC = 1) will be treated as DE.
res <- topTable(fit, coef = "groupacetaminophen", number = Inf) %>%
dplyr::select(SYMBOL, baseMeanLog2 = AveExpr, log2FoldChange = logFC, padj = adj.P.Val)
ggpubr::ggmaplot(
res,
fdr = 0.1,
fc = 2,
genenames = res$SYMBOL,
size = 1
)Volcano plot
Volcano plots are commonly used to display the results of RNA-seq or other omics experiments. A volcano plot is a type of scatterplot that shows statistical significance (p-value) versus magnitude of change (LFC). It enables quick visual identification of genes with large LFCs that are also statistically significant. In a volcano plot, the most upregulated genes are towards the right, the most downregulated genes are towards the left, and the most statistically significant genes are towards the top.
We will use EnhancedVolcano package which produces highly customizable volcano plots in publication-ready quality.
EnhancedVolcano::EnhancedVolcano(
# A dataframe.
toptable = res,
# Vector of gene names.
lab = res$SYMBOL,
# Which column from toptable parameter to use for x and y values.
x = "log2FoldChange",
y = "padj",
pCutoff = 0.1,
# LFC cutoff (applied on x parameter).
FCcutoff = 2,
title = "Volcano plot",
subtitle = "acetoaminophen vs. control",
ylab = "-log10(adjusted p-value)",
# To avoid overlapping gene labels, draw arrows for them.
drawConnectors = TRUE
)ReportingTools
ReportingTools is a great package for HTML reporting of, but not only, DEA. Basically, it outputs a table from the topTable(), but in an interactive HTML form. It is working with many DE analysis packages, as you will see in RNA-seq.
We will iterate over all treatment sample groups and for each we create the HTML report. Because ReportingTools are throwing error when no probe is meeting a filtering criteria, we have to put the code inside the tryCatch() function. The following code could be a bit complicated, so don’t hesitate to ask me.
output_groups <- levels(groups)
output_groups <- output_groups[output_groups != "control"]
rep_theme <- reporting.theme()
lattice.options(default.theme = rep_theme)
for (group in output_groups) {
tryCatch(
{
# This is an object (and a file) to which we are publishing.
de_report <- HTMLReport(
shortName = group,
title = glue("{group} vs. control"),
# reportDirectory = dirname(REPORT_DIR),
basePath = REPORT_DIR
# reportDirectory = REPORT_DIR
)
# We can control what to take from topTable.
# Here, we want top REPORT_N_GENES DE genes (ranked by p-value adjusted by FDR) with minimal LFC of 1 (i.e. two-fold up or down).
# We won't be much conservative and set adjusted p-value cutoff to 0.1. That means we expect 10 % of our DE genes to be false positives.
publish(
fit,
de_report,
eSet = norm_data,
factor = groups,
coef = glue("group{group}"),
n = REPORT_N_GENES,
lfc = REPORT_LFC_THRESHOLD,
pvalueCutoff = REPORT_P_VALUE_THRESHOLD,
adjust.method = REPORT_P_VALUE_ADJUST_METHOD
)
finish(de_report)
cat(glue("\n\n<a href='{basename(REPORT_DIR)}/{group}.html' target='_blank'>Report of DEGs for <b>{group}</b> vs. control</a>", .trim = FALSE))
},
error = function(e) {
cat(glue("\n\n<b>{group}</b> vs. control: {e$message}", .trim = FALSE))
})
}Report of DEGs for acetaminophen vs. control
aspirin vs. control: No probes meet selection criteria. Try changing the log-fold change or p-value cutoff.
Custom reporting of DEGs
Alternatively to ReportingTools, it is also possible to make a custom report template in the Rmd format. You just need to:
- Make a parametrized Rmd file (howto here and here).
- Format the output of
topTable(): join withfData(), add HTML links to ensembl.org (for ENSEMBL IDs) or GeneCards (for SYMBOLs), or any other suitable database. - Send it as a parameter to the Rmd file and use e.g.
DT::datatable()to output pretty tables. - You can also use more parameters: report title (e.g. contrast name), thresholds for DEGs (LFC, FDR, …), sample data, …
By using the custom Rmd file you have a full control over the content, which is otherwise a bit more laborious with ReportingTools, in which, besides using topTable() output, you can also call publish() with any HTML element or another R object (e.g. dataframe) and the output will be appended to the HTMLReport object and appears in the final HTML file.
Lastly, all tables in R can be easily exported to tab-delimited formats (CSV, TSV) or Excel (openxlsx package).
Boxplots
We can use our boxplot functions also for microarray data. In case of plot_boxplots(), we need to convert our ExpressionSet to the long format. And this is a big drawback, because such dataframe will be huge. So don’t convert a full dataset at once, unless you have a lot of RAM.
We convert to the long format only the subset of first 4 probes:
data_long <- exprs(norm_data)[1:4, ] %>%
as.data.frame() %>%
tibble::rownames_to_column("PROBEID") %>%
tidyr::pivot_longer(-PROBEID, names_to = "sample_name", values_to = "E") %>%
dplyr::left_join(fData(norm_data), by = "PROBEID") %>%
dplyr::left_join(pData(norm_data), by = "sample_name")
head(data_long)Now let’s make boxplots:
plot_boxplots(
data_long,
x = "sample_group",
y = "E",
facet_by = "SYMBOL",
color_by = "sample_group",
main = "Affymetrix",
x_lab = "Sample_Group",
y_lab = "log2(expression intensity)",
do_t_test = FALSE
) +
theme(
axis.text.x = element_blank(),
axis.ticks.x = element_blank()
)As for the qPCR data, it would be nice to see LFC between the treatment groups and the control group. Again, this is doable by subtracting the mean expression intensity of the control group.
Batch effect
Batch effects are technical sources of variation that have been added to the samples during handling or, in our case, by different technical preparation of the experiments. They have to be removed, otherwise they act as confounding elements which can be wrongly interpreted as biological variation.
However, there are two different ways of batch effect correction, aimed for different purposes:
- Correcting the values in expression matrix. This is done prior to exploratory analysis (e.g. PCA, heatmaps) in order to visualize the underlying biological effect. You shouldn’t use the corrected values for DE analysis!
- For DEA, including the batch effect factors in a linear model. In short, most DE models work best on uncorrected data.
These two different ways are nicely summarised here and more rigorously in Methods that remove batch effects while retaining group differences may lead to exaggerated confidence in downstream analyses (2016).
To demonstrate the batch effect, we will join two experiments. The one we have been working with and the second one in which rats were treated with acetaminophen, and which used the same microarray technology as in the former.
The preprocessing is quite similar to what we have already done.
pheno_data_2 <- readr::read_delim(EXPERIMENT_2_SAMPLE_SHEET_FILE, delim = "\t") %>%
dplyr::select(1, 4, 5, 38) %>%
magrittr::set_colnames(c("sample_name", "sample_group", "rat_status", "cel_file"))
head(pheno_data_2)pheno_data_2 <- pheno_data_2 %>%
dplyr::filter(rat_status == "susceptible") %>%
dplyr::mutate(
sample_name = str_replace(sample_name, " 1", ""),
sample_group = dplyr::if_else(str_detect(sample_group, "predose"), "control", "acetaminophen"),
batch = 2L,
rat_status = NULL
)
pheno_data_2We modify the sample sheet from the first experiment to contain only control and acetaminophen samples and concatenate it rowwise with pheno_data_2:
pheno_data_batch <- pData(norm_data) %>%
dplyr::filter(sample_group %in% c("control", "acetaminophen")) %>%
dplyr::mutate(batch = 1L) %>%
dplyr::bind_rows(pheno_data_2) %>%
dplyr::mutate(
index = NULL,
sample_group = factor(sample_group) %>% relevel("control"),
batch = factor(batch),
cel_file = dplyr::if_else(batch == 1, glue("{EXPERIMENT_1_DATA_DIR}/{cel_file}"), glue("{EXPERIMENT_2_DATA_DIR}/{cel_file}"))
) %>%
as.data.frame() %>%
magrittr::set_rownames(.$sample_name)
pheno_data_batchraw_data_batch <- read.celfiles(pheno_data_batch$cel_file)## Reading in : /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-57822/GSM1392637_94676.CEL
## Reading in : /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-57822/GSM1392635_94671.CEL
## Reading in : /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-57822/GSM1392459_94436.CEL
## Reading in : /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-57822/GSM1394061_96729.CEL
## Reading in : /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-57822/GSM1394040_96706.CEL
## Reading in : /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-57822/GSM1393116_95535.CEL
## Reading in : /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-57822/GSM1393085_95478.CEL
## Reading in : /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-57822/GSM1392575_94565.CEL
## Reading in : /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-57822/GSM1393124_95554.CEL
## Reading in : /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-68065/GSM1662214_AP-5-3_Rat230_2_.CEL
## Reading in : /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-68065/GSM1662213_AP-4-2_Rat230_2_.CEL
## Reading in : /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-68065/GSM1662212_AP-2-8_Rat230_2_.CEL
## Reading in : /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-68065/GSM1662211_AP-2-6_Rat230_2_.CEL
## Reading in : /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-68065/GSM1662210_AP-2-3_Rat230_2_.CEL
## Reading in : /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-68065/GSM1662204_A-5-3_Rat230_2_.CEL
## Reading in : /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-68065/GSM1662203_A-4-2_Rat230_2_.CEL
## Reading in : /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-68065/GSM1662202_A-2-8_Rat230_2_.CEL
## Reading in : /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-68065/GSM1662201_A-2-6_Rat230_2_.CEL
## Reading in : /data/persistent/jirinovo/bio-class-deb10/AGE2021/Exercises/E04-microarrays/data/E-GEOD-68065/GSM1662200_A-2-3_Rat230_2_.CEL
sampleNames(raw_data_batch) <- pheno_data_batch$sample_name
metadata <- data.frame(
labelName = colnames(pheno_data_batch),
labelDescription = c(""),
stringsAsFactors = FALSE
)
pheno_data_batch <- AnnotatedDataFrame(data = pheno_data_batch, varMetadata = metadata)
phenoData(raw_data_batch) <- Biobase::combine(phenoData(raw_data_batch), pheno_data_batch)MAplot(raw_data_batch, groups = pData(raw_data_batch)$sample_group, pairs = TRUE)boxplot_colors <- dplyr::if_else(pData(raw_data_batch)$batch == 1, "red", "blue")
boxplot(raw_data_batch, nsample = nrow(raw_data_batch), col = boxplot_colors)
legend("topright", legend = c("1", "2"), title = "Batch", col = c("red", "blue"), pch = 20)fit_plm_batch <- fitProbeLevelModel(raw_data_batch)
RLE(fit_plm_batch)NUSE(fit_plm_batch)Hmm, it looks suspicious, doesn’t it? 🤔 Samples from the second batch seem to have a systematically shifted probe intensity.
norm_data_batch <- rma(raw_data_batch)## Background correcting
## Normalizing
## Calculating Expression
At the biological level, you can see that the samples are obviously divided according to batches.
groups_batch <- pData(norm_data_batch)$sample_group
batches <- pData(norm_data_batch)$batch
names(groups_batch) <- sampleNames(norm_data_batch)
names(batches) <- sampleNames(norm_data_batch)
plot_hc(exprs(norm_data_batch), color_by = groups_batch, color_by_lab = "Sample Group")plot_hc(exprs(norm_data_batch), color_by = batches, color_by_lab = "Batch")plot_pca(exprs(norm_data_batch), sample_data = pData(norm_data_batch), n_top_features = 1000, color_by = "sample_group", shape_by = "batch")$plotplot_heatmap(
exprs(norm_data_batch),
n_top_features = 1000,
z_score = TRUE,
column_annotation = dplyr::select(pData(norm_data_batch), sample_group, batch),
title = "Affymetrix batch effect",
legend_title = "z-score",
show_row_names = FALSE
)The batch effect is quite obvious from the plots above.
Correcting expression values for batch effect
Let’s correct the batch effect with the ComBat() function from sva package. We will specify how are samples divided to batches and what is our experimental design. Using this information, ComBat() will try to remove the batch effect while preserving the biological one.
norm_data_batch_combat <- norm_data_batch
exprs(norm_data_batch_combat) <- ComBat(exprs(norm_data_batch_combat), batch = batches, mod = model.matrix(~ groups_batch))Let’s look again at the biological exploratory analysis. The batch effect seems to be corrected 🙂
plot_hc(exprs(norm_data_batch_combat), color_by = groups_batch, color_by_lab = "Sample Group")plot_hc(exprs(norm_data_batch_combat), color_by = batches, color_by_lab = "Batch")plot_pca(exprs(norm_data_batch_combat), sample_data = pData(norm_data_batch_combat), n_top_features = 1000, color_by = "sample_group", shape_by = "batch")$plotplot_heatmap(
exprs(norm_data_batch_combat),
n_top_features = 1000,
z_score = TRUE,
column_annotation = dplyr::select(pData(norm_data_batch_combat), sample_group, batch),
title = "Affymetrix batch effect corrected with ComBat",
legend_title = "z-score",
show_row_names = FALSE
)It is not perfect, but it seems that samples are now correctly divided by sample group.
Alternatively, an another method could be used for batch effect correction:
norm_data_batch_limma <- norm_data_batch
exprs(norm_data_batch_limma) <- limma::removeBatchEffect(exprs(norm_data_batch_limma), batch = batches, design = model.matrix(~ groups_batch))The results are quite similar to ComBat():
plot_hc(exprs(norm_data_batch_limma), color_by = groups_batch, color_by_lab = "Sample Group")plot_hc(exprs(norm_data_batch_limma), color_by = batches, color_by_lab = "Batch")plot_pca(exprs(norm_data_batch_limma), sample_data = pData(norm_data_batch_limma), n_top_features = 1000, color_by = "sample_group", shape_by = "batch")$plotplot_heatmap(
exprs(norm_data_batch_limma),
n_top_features = 1000,
z_score = TRUE,
column_annotation = dplyr::select(pData(norm_data_batch_limma), sample_group, batch),
title = "Affymetrix batch effect corrected with limma",
legend_title = "z-score",
show_row_names = FALSE
)Taking batch effect into account in DEA
Now we take the batch effect into account in the linear model used for DEA, adding it as an another factor (covariate):
dea_model_batch <- model.matrix(~ groups_batch + batches)
colnames(dea_model_batch)[1] <- "Intercept"
dea_model_batch## Intercept groups_batchacetaminophen batches2
## 1 1 1 0
## 2 1 1 0
## 3 1 1 0
## 4 1 0 0
## 5 1 0 0
## 6 1 0 0
## 7 1 0 0
## 8 1 0 0
## 9 1 0 0
## 10 1 1 1
## 11 1 1 1
## 12 1 1 1
## 13 1 1 1
## 14 1 1 1
## 15 1 0 1
## 16 1 0 1
## 17 1 0 1
## 18 1 0 1
## 19 1 0 1
## attr(,"assign")
## [1] 0 1 2
## attr(,"contrasts")
## attr(,"contrasts")$groups_batch
## [1] "contr.treatment"
##
## attr(,"contrasts")$batches
## [1] "contr.treatment"
fit_batch <- lmFit(norm_data_batch, dea_model_batch) %>% eBayes()
colnames(fit_batch)## [1] "Intercept" "groups_batchacetaminophen" "batches2"
We are assuming that effects of groups_batch and batches are additive, and so we can ask e.g. whether is there difference between ibuprofen and control regardless the batch, which is assumed to have the same effect for all sample groups. In other words, the batch effect is now “eaten” by the batches covariate, and so we observe the correct statistics for the main biological effect of used drug (the groups_batch covariate).
Cleanup
All R objects can be saved in Rds format with saveRDS(), and loaded with readRDS(). We save the norm_data and fit objects, because you will use them in the assignment:
saveRDS(norm_data, file = here(BASE_DIR, "norm_data.Rds"))
saveRDS(fit, here(BASE_DIR, "fit.Rds"))save(list = ls(all.names = TRUE), file = here(BASE_DIR, "microarrays.RData"), envir = environment())
warnings()
traceback()## No traceback available
sessioninfo::session_info()## ─ Session info ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## setting value
## version R version 4.0.5 (2021-03-31)
## os Debian GNU/Linux 10 (buster)
## system x86_64, linux-gnu
## ui RStudio
## language (EN)
## collate en_US.UTF-8
## ctype en_US.UTF-8
## tz Europe/Prague
## date 2021-06-01
##
## ─ Packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## package * version date lib source
## abind 1.4-5 2016-07-21 [1] CRAN (R 4.0.4)
## affxparser 1.62.0 2020-10-27 [1] Bioconductor
## affyio 1.60.0 2020-10-27 [1] Bioconductor
## annotate 1.68.0 2020-10-27 [1] Bioconductor
## AnnotationDbi * 1.52.0 2020-10-27 [1] Bioconductor
## AnnotationFilter 1.14.0 2020-10-27 [1] Bioconductor
## AnnotationForge 1.32.0 2020-10-27 [1] Bioconductor
## ash 1.0-15 2015-09-01 [1] CRAN (R 4.0.4)
## askpass 1.1 2019-01-13 [1] CRAN (R 4.0.4)
## assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.4)
## backports 1.2.1 2020-12-09 [1] CRAN (R 4.0.4)
## base64enc 0.1-3 2015-07-28 [1] CRAN (R 4.0.4)
## beeswarm 0.3.1 2021-03-07 [1] CRAN (R 4.0.5)
## Biobase * 2.50.0 2020-10-27 [1] Bioconductor
## BiocFileCache 1.14.0 2020-10-27 [1] Bioconductor
## BiocGenerics * 0.36.1 2021-04-16 [1] Bioconductor
## BiocManager 1.30.12 2021-03-28 [1] CRAN (R 4.0.5)
## BiocParallel * 1.24.1 2020-11-06 [1] Bioconductor
## biomaRt 2.46.3 2021-02-09 [1] Bioconductor
## Biostrings * 2.58.0 2020-10-27 [1] Bioconductor
## biovizBase 1.38.0 2020-10-27 [1] Bioconductor
## bit 4.0.4 2020-08-04 [1] CRAN (R 4.0.4)
## bit64 4.0.5 2020-08-30 [1] CRAN (R 4.0.4)
## bitops 1.0-7 2021-04-24 [1] CRAN (R 4.0.5)
## blob 1.2.1 2020-01-20 [1] CRAN (R 4.0.4)
## bookdown 0.22 2021-04-22 [1] CRAN (R 4.0.5)
## broom 0.7.6 2021-04-05 [1] CRAN (R 4.0.5)
## BSgenome 1.58.0 2020-10-27 [1] Bioconductor
## bslib 0.2.4 2021-01-25 [1] CRAN (R 4.0.4)
## cachem 1.0.4 2021-02-13 [1] CRAN (R 4.0.4)
## Cairo 1.5-12.2 2020-07-07 [1] CRAN (R 4.0.4)
## car 3.0-10 2020-09-29 [1] CRAN (R 4.0.4)
## carData 3.0-4 2020-05-22 [1] CRAN (R 4.0.4)
## Category 2.56.0 2020-10-27 [1] Bioconductor
## cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.0.4)
## checkmate 2.0.0 2020-02-06 [1] CRAN (R 4.0.4)
## circlize 0.4.12 2021-01-08 [1] CRAN (R 4.0.4)
## cli 2.5.0 2021-04-26 [1] CRAN (R 4.0.5)
## clue 0.3-59 2021-04-16 [1] CRAN (R 4.0.5)
## cluster 2.1.2 2021-04-17 [3] CRAN (R 4.0.5)
## codetools 0.2-18 2020-11-04 [3] CRAN (R 4.0.4)
## colorspace 2.0-0 2020-11-11 [1] CRAN (R 4.0.4)
## ComplexHeatmap * 2.6.2 2020-11-12 [1] Bioconductor
## conflicted * 1.0.4 2019-06-21 [1] CRAN (R 4.0.4)
## crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.4)
## curl 4.3.1 2021-04-30 [1] CRAN (R 4.0.5)
## data.table 1.14.0 2021-02-21 [1] CRAN (R 4.0.4)
## DBI * 1.1.1 2021-01-15 [1] CRAN (R 4.0.4)
## dbplyr 2.1.1 2021-04-06 [1] CRAN (R 4.0.5)
## DelayedArray 0.16.3 2021-03-24 [1] Bioconductor
## DESeq2 1.30.1 2021-02-19 [1] Bioconductor
## dichromat 2.0-0 2013-01-24 [1] CRAN (R 4.0.4)
## digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.4)
## dplyr * 1.0.5 2021-03-05 [1] CRAN (R 4.0.4)
## edgeR 3.32.1 2021-01-14 [1] Bioconductor
## ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.0.5)
## emo * 0.0.0.9000 2021-02-26 [1] Github (hadley/emo@3f03b11)
## EnhancedVolcano 1.8.0 2020-10-27 [1] Bioconductor
## ensembldb 2.14.1 2021-04-19 [1] Bioconductor
## evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.4)
## extrafont 0.17 2014-12-08 [1] CRAN (R 4.0.4)
## extrafontdb 1.0 2012-06-11 [1] CRAN (R 4.0.4)
## fansi 0.4.2 2021-01-15 [1] CRAN (R 4.0.4)
## farver 2.1.0 2021-02-28 [1] CRAN (R 4.0.4)
## fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.0.4)
## ff 4.0.4 2020-10-13 [1] CRAN (R 4.0.4)
## forcats * 0.5.1 2021-01-27 [1] CRAN (R 4.0.4)
## foreach 1.5.1 2020-10-15 [1] CRAN (R 4.0.4)
## foreign 0.8-81 2020-12-22 [3] CRAN (R 4.0.5)
## Formula 1.2-4 2020-10-16 [1] CRAN (R 4.0.4)
## friendlyeval * 0.1.1 2021-02-26 [1] Github (milesmcbain/friendlyeval@7ea2ed9)
## fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.4)
## genefilter * 1.72.1 2021-01-21 [1] Bioconductor
## geneplotter 1.68.0 2020-10-27 [1] Bioconductor
## generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.4)
## GenomeInfoDb 1.26.7 2021-04-08 [1] Bioconductor
## GenomeInfoDbData 1.2.4 2021-02-26 [1] Bioconductor
## GenomicAlignments 1.26.0 2020-10-27 [1] Bioconductor
## GenomicFeatures 1.42.3 2021-04-01 [1] Bioconductor
## GenomicRanges 1.42.0 2020-10-27 [1] Bioconductor
## GetoptLong 1.0.5 2020-12-15 [1] CRAN (R 4.0.4)
## GGally 2.1.1 2021-03-08 [1] CRAN (R 4.0.5)
## ggalt 0.4.0 2017-02-15 [1] CRAN (R 4.0.4)
## ggbeeswarm 0.6.0 2017-08-07 [1] CRAN (R 4.0.4)
## ggbio 1.38.0 2020-10-27 [1] Bioconductor
## ggcorrplot 0.1.3 2019-05-19 [1] CRAN (R 4.0.4)
## ggplot2 * 3.3.3 2020-12-30 [1] CRAN (R 4.0.4)
## ggpubr 0.4.0 2020-06-27 [1] CRAN (R 4.0.4)
## ggrastr 0.2.3 2021-03-01 [1] CRAN (R 4.0.4)
## ggrepel 0.9.1 2021-01-15 [1] CRAN (R 4.0.4)
## ggsignif 0.6.1 2021-02-23 [1] CRAN (R 4.0.4)
## GlobalOptions 0.1.2 2020-06-10 [1] CRAN (R 4.0.4)
## glue * 1.4.2 2020-08-27 [1] CRAN (R 4.0.4)
## GO.db 3.12.1 2021-02-26 [1] Bioconductor
## GOstats 2.56.0 2020-10-27 [1] Bioconductor
## graph 1.68.0 2020-10-27 [1] Bioconductor
## gridExtra 2.3 2017-09-09 [1] CRAN (R 4.0.4)
## GSEABase 1.52.1 2020-12-11 [1] Bioconductor
## gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.4)
## haven 2.4.1 2021-04-23 [1] CRAN (R 4.0.5)
## here * 1.0.1 2020-12-13 [1] CRAN (R 4.0.4)
## highr 0.9 2021-04-16 [1] CRAN (R 4.0.5)
## Hmisc 4.5-0 2021-02-28 [1] CRAN (R 4.0.4)
## hms 1.0.0 2021-01-13 [1] CRAN (R 4.0.4)
## htmlTable 2.1.0 2020-09-16 [1] CRAN (R 4.0.4)
## htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.4)
## htmlwidgets 1.5.3 2020-12-10 [1] CRAN (R 4.0.4)
## httr 1.4.2 2020-07-20 [1] CRAN (R 4.0.4)
## hwriter 1.3.2 2014-09-10 [1] CRAN (R 4.0.4)
## IRanges * 2.24.1 2020-12-12 [1] Bioconductor
## iterators 1.0.13 2020-10-15 [1] CRAN (R 4.0.4)
## janitor 2.1.0 2021-01-05 [1] CRAN (R 4.0.4)
## jpeg 0.1-8.1 2019-10-24 [1] CRAN (R 4.0.4)
## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.0.5)
## jsonlite 1.7.2 2020-12-09 [1] CRAN (R 4.0.4)
## KernSmooth 2.23-18 2020-10-29 [3] CRAN (R 4.0.5)
## klippy * 0.0.0.9500 2021-02-26 [1] Github (rlesur/klippy@378c247)
## knitr * 1.33 2021-04-24 [1] CRAN (R 4.0.5)
## labeling 0.4.2 2020-10-20 [1] CRAN (R 4.0.4)
## lattice * 0.20-41 2020-04-02 [1] CRAN (R 4.0.4)
## latticeExtra 0.6-29 2019-12-19 [1] CRAN (R 4.0.4)
## lazyeval 0.2.2 2019-03-15 [1] CRAN (R 4.0.4)
## lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.4)
## limma * 3.46.0 2020-10-27 [1] Bioconductor
## locfit 1.5-9.4 2020-03-25 [1] CRAN (R 4.0.4)
## lubridate 1.7.10 2021-02-26 [1] CRAN (R 4.0.4)
## magick 2.7.1 2021-03-20 [1] CRAN (R 4.0.5)
## magrittr * 2.0.1 2020-11-17 [1] CRAN (R 4.0.4)
## maps 3.3.0 2018-04-03 [1] CRAN (R 4.0.4)
## MASS 7.3-53.1 2021-02-12 [3] CRAN (R 4.0.5)
## Matrix 1.3-2 2021-01-06 [3] CRAN (R 4.0.5)
## MatrixGenerics 1.2.1 2021-01-30 [1] Bioconductor
## matrixStats 0.58.0 2021-01-29 [1] CRAN (R 4.0.4)
## memoise 2.0.0 2021-01-26 [1] CRAN (R 4.0.4)
## mgcv * 1.8-35 2021-04-18 [3] CRAN (R 4.0.5)
## modelr 0.1.8 2020-05-19 [1] CRAN (R 4.0.4)
## munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.4)
## nlme * 3.1-152 2021-02-04 [3] CRAN (R 4.0.5)
## nnet 7.3-15 2021-01-24 [3] CRAN (R 4.0.5)
## oligo * 1.54.1 2020-11-04 [1] Bioconductor
## oligoClasses * 1.52.0 2020-10-27 [1] Bioconductor
## openssl 1.4.4 2021-04-30 [1] CRAN (R 4.0.5)
## openxlsx 4.2.3 2020-10-27 [1] CRAN (R 4.0.4)
## org.Rn.eg.db * 3.12.0 2021-02-26 [1] Bioconductor
## OrganismDbi 1.32.0 2020-10-27 [1] Bioconductor
## patchwork * 1.1.1 2020-12-17 [1] CRAN (R 4.0.4)
## pd.rat230.2 * 3.12.0 2021-02-26 [1] Bioconductor
## PFAM.db 3.12.0 2021-02-26 [1] Bioconductor
## pillar 1.6.0 2021-04-13 [1] CRAN (R 4.0.5)
## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.4)
## plyr 1.8.6 2020-03-03 [1] CRAN (R 4.0.4)
## png 0.1-7 2013-12-03 [1] CRAN (R 4.0.4)
## preprocessCore 1.52.1 2021-01-08 [1] Bioconductor
## prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.4)
## progress 1.2.2 2019-05-16 [1] CRAN (R 4.0.4)
## proj4 1.0-10.1 2021-01-26 [1] CRAN (R 4.0.4)
## ProtGenerics 1.22.0 2020-10-27 [1] Bioconductor
## purrr * 0.3.4 2020-04-17 [1] CRAN (R 4.0.4)
## R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.0.4)
## R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.0.4)
## R.utils 2.10.1 2020-08-26 [1] CRAN (R 4.0.4)
## R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.4)
## rappdirs 0.3.3 2021-01-31 [1] CRAN (R 4.0.4)
## rat2302.db * 3.2.3 2021-02-26 [1] Bioconductor
## RBGL 1.66.0 2020-10-27 [1] Bioconductor
## RColorBrewer * 1.1-2 2014-12-07 [1] CRAN (R 4.0.4)
## Rcpp 1.0.6 2021-01-15 [1] CRAN (R 4.0.4)
## RCurl 1.98-1.3 2021-03-16 [1] CRAN (R 4.0.5)
## readr * 1.4.0 2020-10-05 [1] CRAN (R 4.0.4)
## readxl 1.3.1 2019-03-13 [1] CRAN (R 4.0.4)
## renv 0.13.2 2021-03-30 [1] CRAN (R 4.0.5)
## ReportingTools * 2.30.2 2021-03-08 [1] Bioconductor
## reprex 2.0.0 2021-04-02 [1] CRAN (R 4.0.5)
## reshape 0.8.8 2018-10-23 [1] CRAN (R 4.0.4)
## reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.0.4)
## Rgraphviz 2.34.0 2020-10-27 [1] Bioconductor
## rio 0.5.26 2021-03-01 [1] CRAN (R 4.0.4)
## rjson 0.2.20 2018-06-08 [1] CRAN (R 4.0.4)
## rlang 0.4.11 2021-04-30 [1] CRAN (R 4.0.5)
## rmarkdown 2.7 2021-02-19 [1] CRAN (R 4.0.4)
## rmdformats * 1.0.2 2021-04-19 [1] CRAN (R 4.0.5)
## rpart 4.1-15 2019-04-12 [3] CRAN (R 4.0.5)
## rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.0.4)
## Rsamtools 2.6.0 2020-10-27 [1] Bioconductor
## RSQLite * 2.2.7 2021-04-22 [1] CRAN (R 4.0.5)
## rstatix 0.7.0 2021-02-13 [1] CRAN (R 4.0.4)
## rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.0.4)
## rtracklayer 1.50.0 2020-10-27 [1] Bioconductor
## Rttf2pt1 1.3.8 2020-01-10 [1] CRAN (R 4.0.4)
## rvest 1.0.0 2021-03-09 [1] CRAN (R 4.0.5)
## S4Vectors * 0.28.1 2020-12-09 [1] Bioconductor
## sass 0.3.1 2021-01-24 [1] CRAN (R 4.0.4)
## scales 1.1.1 2020-05-11 [1] CRAN (R 4.0.4)
## sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.4)
## shape 1.4.5 2020-09-13 [1] CRAN (R 4.0.4)
## snakecase 0.11.0 2019-05-25 [1] CRAN (R 4.0.4)
## stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.4)
## stringr * 1.4.0 2019-02-10 [1] CRAN (R 4.0.4)
## SummarizedExperiment 1.20.0 2020-10-27 [1] Bioconductor
## survival 3.2-11 2021-04-26 [3] CRAN (R 4.0.5)
## sva * 3.38.0 2020-10-27 [1] Bioconductor
## tibble * 3.1.1 2021-04-18 [1] CRAN (R 4.0.5)
## tidyr * 1.1.3 2021-03-03 [1] CRAN (R 4.0.4)
## tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.0.5)
## tidyverse * 1.3.1 2021-04-15 [1] CRAN (R 4.0.5)
## utf8 1.2.1 2021-03-12 [1] CRAN (R 4.0.5)
## VariantAnnotation 1.36.0 2020-10-27 [1] Bioconductor
## vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.0.5)
## vipor 0.4.5 2017-03-22 [1] CRAN (R 4.0.4)
## withr 2.4.2 2021-04-18 [1] CRAN (R 4.0.5)
## xfun 0.22 2021-03-11 [1] CRAN (R 4.0.5)
## XML 3.99-0.6 2021-03-16 [1] CRAN (R 4.0.5)
## xml2 1.3.2 2020-04-23 [1] CRAN (R 4.0.4)
## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.0.4)
## XVector * 0.30.0 2020-10-27 [1] Bioconductor
## yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.4)
## zip 2.1.1 2020-08-27 [1] CRAN (R 4.0.4)
## zlibbioc 1.36.0 2020-10-27 [1] Bioconductor
##
## [1] /usr/local/lib/R/site-library
## [2] /usr/lib/R/site-library
## [3] /usr/lib/R/library
HTML rendering
This chunk is not evaluated (eval = FALSE). Otherwise you will probably end up in recursive hell 🤯
library(conflicted)
library(knitr)
library(here)
if (!require(rmdformats)) {
BiocManager::install("rmdformats")
}
# You can set global chunk options. Options set in individual chunks will override this.
opts_chunk$set(warning = FALSE, message = FALSE, eval = TRUE)
opts_knit$set(root.dir = here())
rmarkdown::render(
here("E04-microarrays/microarrays.Rmd"),
output_file = here("E04-microarrays/microarrays.html"),
envir = new.env()
)